Complexity analysis of matrix product on multicore architectures
نویسندگان
چکیده
The multicore revolution is underway. Classical algorithms have to be revisited in order to take hierarchical memory layout into account. In this paper, we aim at minimizing the number of cache misses paid during the execution of the matrix product kernel on a multicore processor, and we show how to achieve the best possible trade-off between shared and distributed caches.
منابع مشابه
Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملAnalyzing Performance and Power of Multicore Architecture Using Multithreaded Iterative Solver
Problem statement: Scientific modeling and simulations have been popularly used with experiments and theoretical analysis in science and engineering communities. Approach: Consequently, computational demands are growing exponentially to afford large scale modeling and simulations. Results: As a result, multicore computing architectures had been proposed and several products are already availabl...
متن کاملMultifrontal QR Factorization for Multicore Architectures over Runtime Systems
To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decompose...
متن کاملParallel Implementation of Interval Matrix Multiplication
Two main and not necessarily compatible objectives when implementing the product of two dense matrices with interval coefficients are accuracy and efficiency. In this work, we focus on an implementation on multicore architectures. One direction successfully explored to gain performance in execution time is the representation of intervals by their midpoints and radii rather than the classical re...
متن کاملPerformance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures
The solution of nonsymmetric eigenvalue problems, Ax = λx, can be accelerated substantially by first reducing A to an upper Hessenberg matrix H that has the same eigenvalues as A. This can be done using Householder orthogonal transformations, which is a well established standard, or stabilized elementary transformations. The latter approach, although having half the flops of the former, has bee...
متن کامل